Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

نویسندگان

  • Valery Solovyev
  • Vladimir Ivanov
چکیده

Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of Russian Language Resources

In this paper we describe the creation of large scale linguistic resources for Russian language. Internet/intranet system architecture was developed to make a large volume of Russian language lexical information, corpora (texts) and knowledge base (Russian WordNet) available to the system at development and/or run time. There are four linguistic counterparts, corresponding to the major categori...

متن کامل

Collection, Annotation and Analysis of Gold Standard Corpora for Knowledge-Rich Context Extraction in Russian and German

This paper describes the collection, annotation and linguistic analysis of a gold standard for knowledge-rich context extraction on the basis of Russian and German web corpora as part of ongoing PhD thesis work. In the following sections, the concept of knowledge-rich contexts is refined and gold standard creation is described. Linguistic analyses of the gold standard data and their results are...

متن کامل

Employing Event Inference to Improve Semi-Supervised Chinese Event Extraction

Although semi-supervised model can extract the event mentions matching frequent event patterns, it suffers much from those event mentions, which match infrequent patterns or have no matching pattern. To solve this issue, this paper introduces various kinds of linguistic knowledge-driven event inference mechanisms to semi-supervised Chinese event extraction. These event inference mechanisms can ...

متن کامل

Incremental Chinese Lexicon Extraction with Minimal Resources on a Domain-Specific Corpus

This article presents an original lexical unit extraction system for Chinese. The method is based on an incremental process driven by an association score featuring a minimal resources statistically aided linguistic approach. We also introduce a linguistics-based lexical unit definition and use it to describe an evaluation protocol dedicated to the task. The experimental results on a domain spe...

متن کامل

Extraction of Knowledge-Rich Contexts in Russian - A Study in the Automotive Domain

This paper presents ongoing research aiming at the automated extraction of knowledge-rich contexts (KRCs) from a Russian language corpus. The notion of KRCs was introduced by Meyer (2001) and refers to a term’s co-text (Sebeok, 1986) as a reservoir of potentially important information about a concept. From a terminological point of view, it seems that KRCs contain exactly the kind of informatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2016  شماره 

صفحات  -

تاریخ انتشار 2016